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1  Overview 

We  investigated  the  problem  of  secure  information  exchange  and  integra¬ 
tion  across  organization/domain  boundaries  using  policy  management.  Though 
policy  management  is  a  popular  method  for  enforcing  flexible  and  modifiable 
security  constraints,  this  popularity  has  led  to  the  development  of  several  pol¬ 
icy  languages  to  meet  domain  and  application  specific  conditions.  This  makes 
cross-domain  collaboration  and  data  sharing  very  difficult  and  almost  impossi¬ 
ble  without  prior  negotiation.  The  problem  is  further  exacerbated  when  several 
domains  are  involved  in  the  transaction  such  as  in  federated  querying  across 
multiple  data  sources.  We  developed  a  query  federation  test-bed  that  allows 
users  to  make  queries  against  multiple  policy  controlled  Semantic  Web  datasets 
simultaneously.  This  system  performs  on-the-fly  mash-ups  of  sensitive  data,  the 
access  to  which  needs  to  be  regulated.  It  also  supports  multi-ontology  federation 
that  allows  users  to  make  queries  in  their  ontologies  without  worrying  about  the 
ontologies  of  the  datasets  in  the  federation.  We  studied  policy  languages  such  as 
XACML  and  AIR  and  modeled  them  in  W3C’s  Rule  Interchange  Format  (RIF) 
to  enable  dynamic  translation  between  policies  in  these  commonly  used  policy 
language.  Lastly,  we  studied  how  secure  information  sharing  would  benefit  public 
education  using  Massachusetts  as  a  usecase. 

2  Security  Policy  Languages 

We  proposed  that  policy  languages  that  are  used  for  information  sharing  have 
a  common  subset  of  semantics  defining  certain  common  features  or  concepts. 
Our  plan  was  to  study  several  policy  languages  and  express  their  semantics  in 
RIF,  which  is  a  standard  for  exchanging  rules  on  the  Web.  This  would  help  us 
identify  the  common  RIF  subset  for  security  policies  that  would  act  as  a  pol¬ 
icy  interlingua.  This  policy  interlingua  would  enable  domains  to  continue  using 
their  own  policy  languages  within  the  domain,  and  provide  a  certain  minimum 
expressivity  for  collaborations  and  information  sharing  across  domains. 

We  studied  several  policy  languages  including  extensible  Access  Control 
Markup  Language  (XACML)  [9]  and  AIR  (Accountability  In  RDF)  [6].  XACML 
is  an  OASIS  standard  language  for  the  specification  of  access  control  policies. 


Earlier  we  showed  how  the  semantics  of  XACML  could  be  expressed  in  RIF-PRD 
(Production  Rule  Dialect)  via  an  intermediate  datalog  representation.  Then  we 
defined  a  translation  between  XACML  and  RIF  that  allowed  XACML  and  non- 
XACML  systems  to  collaborate  while  maintaining  their  security  policies.  More 
recently  we  have  been  working  on  doing  the  same  with  AIR.  A  future  goal  is  to 
use  these  RIF-PRD  translations  to  define  a  common  subset  in  RIF-PRD  that 
will  form  the  policy  interlingua. 

2.1  AIR 

AIR  (Accountability  In  RDF)  is  an  extension  to  N3Logic  [3]  and  has  been  struc¬ 
tured  to  meet  the  provenance  and  reusability  requirements  of  Web  information 
systems.  Along  with  including  the  N3Logic  features  of  scoped  negation,  scoped 
contextualized  reasoning,  nested  graphs,  and  built-in  functions,  AIR  also  sup¬ 
ports  Linked  Rules  and  is  focused  on  generating  useful  justifications  for  all  ac¬ 
tions  made  by  the  reasoner.  Like  N3Logic,  AIR  is  represented  in  N3  [2],  which 
provides  a  human-readable  syntax  for  a  superset  of  Resource  Description  Frame¬ 
work  (RDF).  N3  extends  the  RDF  data  model  by  allowing  for  the  quantification 
of  variables  as  URIs  with  the  @forAll  and  @forSome  directives.  It  also  permits 
the  inclusion  of  nested  graphs  by  using  curly  braces  to  quote  subgraphs.  AIR 
is  made  up  of  a  set  of  built-in  functions  and  two  independent  ontologies:  the 
first  is  for  the  specification  of  AIR  rules,  and  the  second  deals  with  describing 
justifications  for  the  inferences  made  by  AIR  rules.  The  built-in  functions  allow 
rules  to  access  Web  resources,  query  SPARQL  endpoints,  and  perform  scoped 
contextualized  reasoning,  as  well  as  basic  math,  string  and  cryptographic  oper¬ 
ations.  While  developing  the  rule  ontology,  we  focused  on  capturing  how  real 
world  rules  and  laws  are  written  to  allow  them  to  be  represented  naturally  in 
AIR.  For  the  justification  ontology,  our  focus  was  on  re-usability  of  justifications 
and  on  automated  proof  checking.  When  given  as  input  some  AIR  rules,  de¬ 
fined  in  the  AIR  rules  ontology,  and  some  Semantic  Web  data,  the  AIR  reasoner 
produces  a  set  of  inferences  that  are  annotated  with  justifications.  The  runtime 
input  to  AIR  rules  can  be  any  RDF  graph  or  an  empty  graph,  if  the  rules  only 
access  Web  resources. 

Please  refer  to  [8]  for  information  about  the  semantics  of  the  AIR  language 
and  to  [7]  for  information  about  the  translation  of  AIR  to  RIF. 

3  Query  Federation 

Federating  querying  or  searching  is  the  concurrent  search  of  multiple,  distributed 
data  sources.  It  enables  users  and  applications  to  issue  a  single  query  to  the  feder¬ 
ation  engine,  which  then  converts  it  into  multiple  queries  against  the  distributed 
data  sources,  and  returns  the  merged  result  of  those  queries.  The  federation 
engine  we  developed  provides  transparent  access  to  multiple  data  sources.  How¬ 
ever,  the  lack  of  a  shared  model  for  security  and  privacy  requirements  impedes 
this  transparency  as  the  federation  engine  is  unable  to  process  the  different 


SPARQL 

SSL  Module  | 

Validator 

Endpoint 

A 

£ 


Proof  Checker 


SPARQL 

Endpoint 


SPARQL 
Query  + 

Policy  Proof  \ Proof  Checker/ 


SPARQL 

Endpoint 


•.Proof  Checker/ 


Fig.  1.  Query  Federation  Architecture 


requirements  of  each  data  source  and  obtain  appropriate  credentials  from  the 
requester.  This  causes  most  federations  to  require  prior  setup  and  negotiation  of 
policy  and  prevents  the  dynamic  integration  of  data  from  these  data  sources.  By 
incorporating  our  policy  interoperability  technologies  into  our  federation  engine 
we  will  enable  dynamic  secure  query  federation  over  distributed  data  sources 
with  disparate  policy  languages. 


4  Architecture 

We  designed  a  federation  algorithm  for  Semantic  Web  sources  [4]  and  imple¬ 
mented  a  test-bed.  This  included  designing  an  ontology  1  for  describing  Seman¬ 
tic  data  sources  and  their  policies.  The  system  is  illustrated  in  Figure  1  and  a 
screenshot  is  shown  in  Figure  2.  Its  main  components  are  the  i)  Validator,  which 
validates  the  query  provided  by  the  user;  ii)  the  Mapper,  which  splits  the  query  to 
several  subqueries  based  on  descriptions  of  endpoints;  iii)  the  Optimizer,  which 
reorders  the  subqueries  according  to  the  optimization  metrics;  iv)  the  Orchestra- 
tor,  which  executes  the  subqueries  and  integrates  the  various  result  sets;  and,  v) 
the  Proof  Generator,  which  generates  a  proof  for  each  secure  SPARQL  endpoint 
based  on  client  supplied  credentials  and  endpoint  descriptions,  if  necessary.  The 
Federation  Engine  looks  up  the  source  descriptions  of  the  Semantic  data  sources 


1  http:/ /dig. csail.mit.edu/2009/AFOSR/service-description.n3 


SPARQL  Federator 


Welcome  to  the  Decentralized  Information  Group's  Semantic  Federation  Engine. 

This  interface  allows  you  to  submit  a  single  query  to  the  Federation  Engine,  which  then  attempts  to  find  a  solution  in  the  endpoints  that  are  registered  with  the  system. 

Currently,  four  DBPedia  datasets  are  hosted  on  four  endpoints  that  are  registered  with  the  Federation  Engine  • 

•  Mapping  based  Infoboxcs  f Description,  SPARQL  Endpolntl 

•  People  Data  [Description,  SPARQL  Endpoint! 

•  Article  Categories  f Description,  SPARQL  Endpolntl 

•  Category  Labels  [Description,  SPARQL  Endpointl 

You  could  get  a  fool  for  the  functionality  of  the  Federation  Engine  by  selecting  and  executing  one  or  more  of  the  sample  queries  below.  Alternatively,  you  could  also  create  and  run  your  own  queries  based  on  the  information  you  may  have  on  the  four  datasets. 


g  Hollywood  actors  born  in  Paris 
n  German  musicians  from  Berlin 
g  American  Presidents  and  their  Vocations 

o  Athletes  who  played  in  the  NBA  and  Minor  League  Baseball  (Uh  oh!) 
o  Translation  Test 
o  Input  your  own 


*C«rman  Musicians  who  were  bom  in  Berlin 

PREFIX  dbpedia:  <http://dbpedia.org/ontology/> 

PREFIX  dbp_resource:  <http://dbpedia.org/resource/> 

PREFIX  dc  <http  //purl.org/dc/elements/l.l/> 

PREFIX  dc  terms:  <http://purl.org/dc/terms/> 

PREFIX  foah  <http://xmlns.eom/foaf/0.l/> 

PREFIX  rdf:  <http  //w»w.w3.org/1999/02/22-rdf-syntax-ns#> 

PREFIX  rdfs:  <http  //www.w3.org/2000/01/rdf-schema*> 

SELECT  ?n?b ’label  WHERE 

?p  foaf.name  ?n  . 

?p  dbpedia:  birth  Date  ?b  . 

?pdbpedia:birthPlace  <http://dbpedia.org/resource/8erlin>  . 

?p  dc:terms:subject  <http  //dbpedia.org/resource/Category.Cerman_musicians>  . 
<http://dbped1a.0rg/resource/Category:Cerman_muslc1ans>  rdfs  label  ’label . 
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Fig.  2.  Screenshot  of  Federation  System 


(also  known  as  SPARQL  endpoints)  that  have  registered  with  it.  The  Map  Gener¬ 
ator  utility  generates  a  set  of  mapping  rules  based  on  these  sources  descriptions, 
which  is  used  by  the  Mapper.  The  system  functions  as  follows:  A  client  submits  a 
query  to  the  Federation  Engine  on  a  web-form.  The  Validator  validates  the  query 
and  forwards  it  to  the  Mapper.  The  Mapper  rewrites  the  query  into  various  sub¬ 
queries  based  on  the  source  descriptions  known  to  the  Federation  Engine.  Once 
the  mapping  is  done,  the  Optimizer  performs  the  optimization  and  reorders  the 
subqueries.  If  any  of  the  endpoints  in  the  query  plan  requires  specific  credentials 
for  data  access,  the  execution  is  halted  and  the  user  is  prompted  to  resend  the 
query  with  the  additional  credentials.  The  Proof  Generator  generates  a  proof 
based  on  the  user  supplied  credentials.  The  optimized  list  of  subqueries,  along 
with  any  generated  proofs,  is  forwarded  to  the  Orchestrator.  The  Orchestrator 
accepts  the  optimized  list  of  queries,  sends  the  subqueries  along  with  proofs  to 
the  various  endpoints,  integrates  the  different  result  sets,  and  forwards  the  final 
result  to  the  client  on  the  web-form. 

Along  with  designing  and  developing  this  architecture,  we  also  evaluated  it 
extensively  with  different  kinds  of  queries  and  dataset  characteristics.  Please 
refer  to  [4]  for  more  details  about  the  evaluation. 

4.1  Multi- Ontology  Support 

Along  with  policy  interoperability,  we  also  addressed  cross-ontology  integration 
by  incorporating  mappings  between  ontologies.  As  our  federation  testbed  sup- 
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Fig.  3.  Federation  with  Multi-Ontology  Mapping 


ports  SPARQL  Query  Language  for  RDF  (SPARQL)  and  allows  queries  over 
Semantic  Web  data,  it  necessarily  uses  ontologies,  or  formal  representations  of 
commonly  used  terms  in  a  domain.  SPARQL  endpoints  may  use  different  on¬ 
tologies  to  store  their  data  than  the  ones  being  used  by  the  client.  It  is  desirable 
that  clients  use  their  own  ontologies  without  worrying  about  remote  databases. 
We  added  a  module  [1]  to  the  federation  testbed  that  enables  clients  to  make 
queries  in  their  ontologies  and  translate  these  queries  into  the  ontologies  used 
by  SPARQL  endpoints  in  the  federation.  The  modified  system  is  illustrated  in 
Figure  3.  These  translated  queries  are  submitted  to  the  federation  engine,  which 
processes  them  as  described  above. 

4.2  Use  Case:  Public  Education  in  Massachusetts 

Education  is  an  important  public  sector  service  where  interoperability  and  se¬ 
cure  information  sharing  can  have  tremendous  benefits.  In  Massachusetts,  the 
Department  of  Elementary  and  Secondary  Education  (DESE)2  is  responsible 
for  the  education  of  the  approximately  550,000  children  in  the  state’s  public 
schools,  which  are  located  in  391  school  districts.  Its  mission  is  To  improve  the 
quality  of  the  public  education  system  so  that  students  are  adequately  prepared 
for  higher  education,  rewarding  employment,  continued  education,  and  respon¬ 
sible  citizenship.  It  has  as  one  of  its  six  primary  goals  the  provision  of  timely, 
useful  information  to  stakeholders  [5].  To  achieve  its  mission  and  goals,  it  is  im¬ 
portant  for  the  DESE  to  track  the  progress  of  students  as  they  advance  through 
the  grades.  Moreover,  it  is  necessary  to  address  the  needs  of  children  in  early 
childhood  and  in  the  post-secondary  years,  when  they  are  not  in  the  purview 
of  the  DESE.  Without  such  attention,  we  would  lack  an  active  citizenry  that 

2  Note:  Massachusetts  has  had  many  reorganizations  of  the  state  level  education  ad¬ 
ministration  in  the  last  decade.  In  this  thesis,  the  term  DESE  is  used  to  identify  the 
Department  of  Elementary  and  Secondary  Education  as  well  as  its  predecessors 


sustains  a  vibrant  democracy  and  an  educated  working-age  population  that  can 
grow  our  knowledge-based  economy  in  a  globalized  world. 

As  part  of  this  grant,  we  investigated  the  challenges  involved  in  deploying 
automated  information  sharing  for  this  usecase  and  have  designed  a  prototype 
system.  Please  refer  to  [4]  for  more  details  about  this  work. 
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